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This paper presents a group-theoretical vector space model (VSM) that extends the VSM with a 
group action on a vector space of the VSM. We use group and its representation theory to represent 
a dynamic transformation of information objects, in which each information object is represented by 
a vector in a vector space of the VSM. Several groups and their matrix representations are employed 
for representing different kinds of dynamic transformations of information objects used in the VSM. 
We provide concrete examples of how a dynamic transformation of information objects is performed 
and discuss algebraic properties involving certain dynamic transformations of information objects 
used in the VSM. 
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1. Introduction 

Vectors have been widely used in the field of cognitive science [1, 48], machine learn¬ 
ing [13], semantics [38], and information retrieval (IR) [26, 52, 54]. The vector space 
model (VSM) [52, 53] is a model based on a vector space, which represents informa¬ 
tion objects (e.g., terms, images, documents, queries, etc.) by vectors in a vector space. 
Each dimension of a vector space represents a feature of an information object, cor¬ 
responding to a basis element of a vector space of the VSM [38]. A wide variety of 
weighting schemes [5, 8, 34, 55] have been proposed and tested, in which each compo¬ 
nent of an information-object vector reflects the importance of the corresponding feature 
of an information-object vector. For the weighted information-object vectors, distance 
functions are often used to determine how to measure the similarity between information- 
object vectors [62]. One common similarity measure between two information-object vec¬ 
tors is the cosine similarity, measuring the cosine of the angle between two information- 
object vectors in a vector space of the VSM [5]. Besides its intuitive nature, the VSM has 
also been proven to be effective in IR and relevance ranking [43, 53]. Meanwhile, relevance 
in IR is often context-dependent as information may evolve with the user, place, and 
time [43-45], which has not been reflected in the classic, standard VSM [34, 52, 54]. Al¬ 
though the VSM incorporating context and its variants have already been researched [43- 
45], there is a lack of a systematic approach to representing a dynamic transformation 
of information objects used in the VSM. Moreover, to the best of our knowledge, theo¬ 
retical foundations of utilizing group theory for the VSM have not been established. In 


Email: dkim@airesearch.kr 



this paper we present a group-theoretical VSM and discuss properties on several types 
of dynamic transformations of information-object vectors in a vector space of the VSM. 
We also show that some properties are invariant to certain dynamic transformations of 
information-object vectors. The rest of this paper is organized as follows. Section 2 gives 
a brief overview of the VSM. In Section 3 we present our group-theoretical VSM to rep¬ 
resent a dynamic transformation of information objects used in the VSM. In Section 4 
we provide related work and discussion. We give concluding remarks in Section 5. In 
Appendix we provide the necessary mathematical background on vector spaces, groups, 
and their representations used in this paper. 


2. Vector Space Model (VSM) 

In this section we give a brief overview of the classical vector space model (VSM) used 
in this paper. Vector spaces play an important role in cognitive science [1, 48], seman¬ 
tics [38], pattern classification [13], and information retrieval (IR). In particular, they 
are commonly used in IR, where IR concerns with methods and procedures of searching 
and obtaining the required information from information resource or corpus [33, 53]. In 
IR the Boolean retrieval model [40] poses queries having the form of Boolean expression 
of terms, in which each query consists of terms combined with Boolean operators, such 
as AND, OR, and NOT. Each document is considered as a set of words in the Boolean 
retrieval model [40]. However, the Boolean retrieval model has some limitations, such 
as the lack of similarity measure and a document ranking method [54]. In contrast to 
the Boolean retrieval model, the VSM has a means to measure the similarity between a 
query and information-object vectors and to rank information-object vectors according 
to their similarity scores to the query [40, 54]. 

The VSM has been formalized as a quadruple < B,W, S, F > [38], where B denotes a 
set of basis elements of a vector space V of the VSM, W specifies a weight function, S 
is a similarity measure that maps a pair of information-object vectors to a scalar-valued 
quantity representing their similarity, and T is a transformation that takes one vector 
space to another vector space. One of the main purposes of F is to reduce the dimen¬ 
sionality of V [38, 56]. F may also be the identity map that transforms V to itself. Note 
that a vector space of the VSM is often considered as a feature space [58]. Therefore, a 
wide variety of feature weighting (or scaling) schemes [37] can be inherited, depending 
on what kind of a feature space is employed for the VSM. 

(1) Basis Elements B: B is a set of basis elements bi,...,bn that determine the 

dimensionality of a vector space V of the VSM. Each dimension of V represents a 
feature of an information object. Each information-object vector v is generated by B, 
i.e., V = where WiS for i = 1,... ,n are weights or coefficients. Note that if 

B' is a set of basis elements b'^, • • • of V, then v can also be generated by R', i.e., 
V = 0^6 basis can be converted into another basis to reflect a contextual 

change of information-object vectors, in which a context may refer to the time, space, 
semantic of information objects, and so on [43, 45]. It means that a basis of a vector 
space in the VSM can be constructed to represent a context [45]. 

(2) Weight function W: IT is a weight function that maps an information object 
to its normalized form that is often represented as a coordinate vector. Each component 
of the coordinate vector represents the weight of the corresponding feature of the 
information object. IT is closely related to feature weighting, which relies on the 
type of a feature space. If a vector space V of the VSM is given as an n-dimensional 
term space [53], then a query vector Q and a document vector Di are represented 



as Q = {WQ^I,WQ^ 2 , - ■ ■ ,WQ^n) and Di = ^*, 2 , • • •, respectively. Each term 

represents each feature of an information object, and each component of an information- 
object vector represents the importance of a term in a document or query vector [34]. 
Note that n distinct terms are considered in an n-dimensional term space. There are a 
wide variety of ways to determine the weight of a term in a given information object. The 
simplest approach is the frequency weighting [37], in which the weight is simply equal to 
the frequency of a feature. A common approach to term-weighting is the tf-idf [34, 40] 
method, where the weight of a term in a document vector is determined by the local and 
global factor. The local factor {term frequency tfij) indicates how often term j appears 
in document i, while the global factor {inverse document frequency idfj) indicates how 
often term j appears in a document collection [34]. More specifically, the weight of term 
j in document i for the tf-idf method is defined as Wij := tfij x idfj = tfij x log{N/dfj), 
where N is the total number of documents in a document collection and dfj denotes 
the number of documents (in a document collection) containing term j [34, 40]. Note 
that the inverse document frequency {idfj := log{N/dfj)) assigns a low value to a term 
that occurs in a large number of documents, while assigning a high value to a term that 
occurs in a small number of documents in a document collection [34]. The interested 
reader may also refer to [14, 37, 40] for other term-weighting schemes, such as Entropy 
weighting [37] and Logarithmic weighting [14]. 

(3) Similarity measure S: S is a similarity measure that maps each pair of information- 
object vectors to a scalar-valued similarity score. The angle between a pair of 
information-object vectors can be used as a simple similarity measure between the 
pair of information-object vectors. Specifically, the cosine of the angle can be used as 
a numeric similarity measure (i.e., 1.0 for identical vectors while 0.0 for orthogonal 
vectors). Furthermore, if two information-object vectors in a vector space of 
the VSM are normalized to the unit length, the cosine of the angle between two 
information-object vectors is simply the inner product of two information-object 
vectors. Now, the cosine similarity between two information-object vectors vi and V 2 is 
defined as sim{vi,V 2 ) ■= ('^i • 'r' 2 )/(lluijjjju 2 ll), where vi ■ V 2 denotes the inner product 
of information-object vectors vi and V 2 - Therefore, in terms of the cosine similarity 
measure, the higher the value of sim{ui,Uj), the more similar information-object vectors 
Ui and Uj are. Other methods are also available for the similarity measure based on a 
distance function. The interested reader may refer to [63] for further details. 

(4) Transformation F: F is a transformation^ that transforms one vector space V 
to another vector space V.) The main purpose of F is to reduce the dimensionality of 
V in such a manner that the dimensionality of V is smaller than the dimensionality of 
V. The matrix decomposition techniques are often used for dimensionality reduction 
(i.e., singular value decomposition [38] and QR decomposition [5]). In some cases it is 
also possible to reduce the dimensionality in the preprocessing steps (e.g., stop word 
elimination and stemming [54]). F can also be the identity transformation that maps a 
vector space V to itself. 

Although the preprocessing and dimensionality reduction steps are often necessary for 
the VSM, we omit them in this paper. The interested reader may refer to [5, 38, 54] for 
further details. Unless otherwise stated, B denotes a set of basis elements of a given vector 
space, W tf-idf S cosine similarity, and F denotes the identity map in < B,W,S,F > 


^ Since a transformation F is often used for dimensionality reduction, it is distinguished from an (invertible) linear 
transformation in this paper. Note that an invertible linear transformation (i.e., isomorphism [15]) from a vector 
space V to itself serves as an element of the general linear group GL{V) (see Appendix). 




of the VSM used in this paper. We assume that every vector space of the VSM is finite¬ 
dimensional in this paper. 

Example 2.1. This example illustrates how the tf-idf weighting method and the cosine 
similarity measure of the VSM are applied to document ranking, where each document 
and a query are represented by a bag of words (unordered words with duplicates al¬ 
lowed) [58]. The bag-of-words model is widely used in a document and image represen¬ 
tation [19, 20], spam filtering [17], etc. The following figure shows query Q and three 
documents Di, D 2 , and ZI 3 , each of which is represented by a bag of words. 

Q: {termi term2} 

Di'. {term^ termi termi term^} 

D2'. {term2 terms terms terms term^} 

Ds'. {term2 termi term2} 

Figure 1. Bag of words for a query Q and documents Di, £> 2 , and £> 3 . 

There are six distinct terms in Figure 1. Table 1 shows term weights for each document 
and query using the tf-idf weighting method [26, 40]. 


Table 1. Term weights for Q, Di, £> 2 , and £>3 in Figure 1. 


Term weights Wij := tfij x idfj 

{tfij : term frequency, idfj : inverse document frequency) 

Total number of documents N=3,idfj := \og{N/dfj) 

{dfj : number of documents containing termj) 


1 tfij 1 
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Using Table 1, we compute the cosine similarity between Q and Di for 1 < i < 3. 
Since ||A|| = and ||Q|| = \jYlj we have 

IIQII = \/0.1762 -b 0.1762 PS i/OAM ^ 0.249, 
ii£»i|| = \/0.3522 -b 0.9542 ~ 1.017, 

ii£>2ii = Vo. 1762 -b 0.9542 + 0.4772 -b 0.4772 PS Vh^ PS 1.182, 

ijUsil = Vo. 1762 -b 0.3522 p. ^/(UM ps 0.394. 

Since Q ■ Di = WQjWij, we have 

Q-Di = 0.176 X 0.352 ps 0.062, 

Q • £>2 = 0.176 X 0.176 ps 0.031, 

Q-Ds = (0.176 X 0.176) -b (0.176 x 0.352) ps 0.093. 

Now, the cosine similarity measure between query Q and document HV for 1 < i < 3 
are computed as follows: 


^By a slight abuse of notation, we use a document (respectively, a query) and its document vector (respectively, 
query vector) with the same notation in this paper. The distinction is clear from the context. 





sim{Q,Di) = {Q ■ Z)i)/(||Q|| ||Di ||) ps 0.062/(0.249 x 1.017) ps 0.245, 
sim{Q,D 2 ) = (Q-Il2)/(||Q||||7?2||) - 0.031/(0.249 x 1.182) p^ 0.105, 
sim{Q,D^) = (Q •Zl 3 )/(||Q||||Zl 3 II) Ri 0.093/(0.249 x 0.394) Ri 0.949. 

For the given query Q, shows the highest rank with sim{Q,D^) ps 0.949, 
while D 2 shows the lowest rank with sim{Q,D 2 ) — 0.105. Therefore, according to the 
cosine similarity measure, document ZI 3 is the most similar to query Q, while document 
D 2 is the least similar to query Q. 


3. Group Actions on a Vector Space of the VSM 

In this section we use several groups to represent dynamic transformations of information 
objects. We show that some properties are preserved for certain dynamic transforma¬ 
tions, in which those dynamic transformations of information objects are represented by 
a group action on a vector space of the VSM. For the bag-of-words model, we assume 
that although the content of an information object can be changed by a dynamic trans¬ 
formation, no term can be introduced during a dynamic transformation of information 
objects. We first describe how an orthogonal group acts on a vector space V = M” of the 
VSM. 

For each n, the set of all n x n orthogonal matrices with real entries forms a subgroup of 
GL(n,M), denoted by 0(n,M), in which a square matrix M is called orthogonal if M~^ = 
M~^ [22]. A linear transformation T : M” —)■ M” is called an orthogonal transformation on 
M” if its transformation matrix in the standard (ordered) basis is an orthogonal matrix 
with real entries [3]. Orthogonal matrices include rotation and permutation matrices [3]. 

Proposition 3.1. Let V = be a vector space of the VSM. //0(n,M) acts on V by 
matrix multiplication, then it preserves the cosine similarity between information-object 
vectors in V. 

Proof. By the definition of the orthogonal group, if 0(n, M) acts on V by matrix multipli¬ 
cation, we have Mvi-Mv 2 = Mv 2 = vj Mv 2 = vjv 2 = vi-V 2 for M G 0(n,M) 

and vi,V 2 G V. Since (Mv)'^(Mv) = v'^v, we have ||Mr;|| = ||u|| for v gV. Therefore, an 
orthogonal matrix preserves both the inner product and the length of information-object 
vectors. It follows that for information-object vectors u,v gV and g G 0(n,M), we have 
sim{gu,gv) = {gu ■ gv)/{\\gu\\\\gv\\) = {u ■ u)/(||u||||u||) = sim{u,v). □ 

Example 3.1. {Householder matrix [ 6 , 29]) Consider a reflection linear operator R G 
GL{V) of a vector space V = M"' of the VSM that reflects each information-object vector 
through the vector hyperplane that is orthogonal to a unit vector u. The transformation 
matrix [R] of the linear operator R with respect to the standard (ordered) basis of is 
called a Householder matrix, and is given by I — 2uu~^. Let H denote [77]. Since HH^ = I 
(see [ 6 ]), we have H G 0(n,M). 

For instance, suppose that we have six terms (i.e., termi, term 2 , terms, term^, terms, 
and terms) and three documents (i.e., Di, D 2 , and Ds) as shown in Table 1. The 6x3 
term-by-document matrix D is denoted as follows: 

/0.352 0 0.176\ 

0 0.176 0.352 

0 0.954 0 

0 0.477 0 

0.954 0 0 

\ 0 0.477 0 


D = 



Each column corresponds to a document Dj, while each row corresponds to a term 
ternii. Each element dij in D represents the term weight of term* associated with docu¬ 
ment Dj. Let u = [—\/2/2, \/2/2,0,0,0,0]"'' and select the vector hyperplane of that 
is orthogonal to u. Then, the Householder matrix H’ is computed as follows: 


/O 1 0 0 0 o\ 
1 0 0 0 0 0 
0 0 1 0 0 0 
0 0 0 1 0 0 
0 0 0 0 1 0 
\0 0 0 0 0 1 / 


Now, the transformation of D by H' is computed as follows: 


D' = H'D = 


/ 0 0.176 0.352 \ 

0.352 0 0.176 

0 0.954 0 

0 0.477 0 

0.954 0 0 

\ 0 0.477 0 


The first column of D' represents the transformation of Di by H', the second column 
of D' the transformation of D 2 by H', and the third column of D' represents the transfor¬ 
mation of D 3 by H'. It basically replaces ter mi with ter m2, and vice versa^, in Di, D 2 , 
and ZI3. Since H' G 0(6, M), the cosine similarity among Oi, ZI2, and ZI3 are preserved 
among H'Di, H'D 2 , and H'D^ by Proposition 3.1. 

The set of all n x n invertible diagonal matrices with real entries forms a subgroup 
of OL(n,M) [49], which is denoted by Zl(n,M) in this paper. We first describe a scaling 
matrix [7]. A scaling matrix is a diagonal matrix, in which each element in the main 
diagonal represents a scaling factor Si for the i-th coordinate axis. If Sj > 1, it represents 
a dilation transformation in the direction of the i-th coordinate axis. If 0 < s* < 1, it 
represents a contraction transformation in the direction of the i-th coordinate axis. If 
Si = —1, it represents a reflection transformation in the direction of the i-th coordinate 
axis. Note that if a scaling matrix has no zero in its main diagonal, it is invertible. 

We say that a linear operator T : V ^ V is diagonalizable scaling linear operator if 
there exists an (ordered) basis of V with respect to which the transformation matrix of 
T is an invertible scaling matrix. 

Proposition 3.2. Let V be an n-dimensional vector space overM. of the VSM. If D[n,M) 
acts on V by matrix multiplication, d G D{n,W) represents a transformation matrix of a 
diagonalizable scaling linear operator ofV. 

Proof. By the definition of D{n,M.), d G D{n,M.) is a diagonal matrix. Since D{n,M) 
is a subgroup of GL(n,M), d G D{n,M.) is invertible. It follows that its determinant is 
not zero. Therefore, each di in the main diagonal of d is not zero and may represent a 
scaling factor in the direction of the i-th coordinate axis. It follows that d is an invertible 


^For a further consideration of a permutation of the basis vectors, consider a symmetric group Sn acting on a 
vector space V = . Let B = {6i, ..., bn} be a basis of V. Then, Sn acts on V by 9{Y2i ^i^i) — 

g G S'n, Cj G M, and Cj&j G V. See [2, 15] for further details. 




scaling matrix that represents a transformation matrix of a diagonalizable scaling linear 
operator of V with respect to a given basis of V. □ 


Example 3.2. Each component of an information-object vector can be varied by change 
of context (e.g., a time-dependent document collection [16, 47]). This example shows the 
systematic way of changing weights using an invertible scaling matrix. The 6x3 term- 
by-document matrix D in Example 3.1 was given as: 

/0.352 0 0.176\ 

0 0.176 0.352 

0 0.954 0 

^ ~ 0 0.477 0 

0.954 0 0 

\ 0 0.477 0 / 

Suppose that an invertible scaling matrix S is given below: 

/2 0 0 0 0 0 \ 

0 3 0 0 0 0 
0 0 2 0 0 0 
“ 0 0 0 1 0 0 ■ 

0 0 0 0 1 0 

\0 0 0 0 0 1 / 

Then, the transformation of H by S' is computed as follows: 

/ 0.704 0 0.352 \ 

0 0.528 1.056 

n//_on- 0 1-908 0 

0 0.477 0 

0.954 0 0 

\ 0 0.477 0 / 

The first column of D" represents the transformation of Di by S, the second column of 
D" the transformation of D 2 by S, and the third column of D" represents the transfor¬ 
mation of II 3 by S. By means of the scaling matrix S, the weight of termi and the weight 
of terms in a document collection are multiplied by two, while the weight of term^ in 
a document collection is multiplied by three. The weight of term 4 , the weight of terms, 
and the weight of terms nre invariant under S. 

Proposition 3.3. Let V be an n-dimensional vector space over M of the VSM. A square 
matrix s G GL{n, M) has n linearly independent eigenvectors if and only if it represents 
a diagonalizable scaling linear operator ofV. 

Proof. (=>) 

Assume that a square matrix s G GL{n, M) has n linearly independent eigenvectors. 
Since s G GL(n,M) by assumption, s is invertible. It follows that the determinant of s 
is not zero. Since s has n linearly independent eigenvectors by assumption and similar 
matrices have the same determinant, s is diagonalizable to an invertible diagonal matrix 
s' G D{n,R) by Theorem A.2^ and Lemma A.2. Therefore, by Theorem A.3 and Propo¬ 
sition 3.2, it represents a diagonalizable scaling linear operator of V. 


^See Appendix for Theorem A.1-A.4 and Lemma A.1-A.3. 



If s G GL(n,M) represents a diagonalizable scaling linear operator of V, then it is diago- 
nalizable by Theorem A.3. Therefore, s G GL(n, M) has n linearly independent eigeven- 
vectors by Theorem A.2. □ 

Remarks. The above proof of Proposition 3.3 involves Theorem A.3, which in turn 
involves a change of basis of a vector space. In [41-45] context is modeled by a basis 
of a vector space of the VSM. By Proposition 3.3, a certain type of invertible linear 
operators of a vector space of the VSM can be simplified to a type of diagonalizable 
linear operators by means of a change of context if context is modeled by a basis of a 
vector space of the VSM. 


Similarly to the above proposition, we have the following lemma by Lemma A.l. 

Lemma 3.1. LetV be an n-dimensional vector space ouerM of the VSM. If s G G'L(n,M) 
is symmetric, it represents a diagonalizable sealing linear operator ofV. 

In Example 3.1 we considered a Householder matrix H given by / — 2uvJ, where 
u is a unit vector orthogonal to the selected vector hyperplane. Since (I — 2uu^)~^ = 
= I — 2uu^, it is symmetric. Since the determinant of a householder 
matrix is —1 [29], we have H G GL(n,M). Therefore, by Lemma 3.1, H may represent a 
diagonalizable scaling linear operator of V = M”. 

Example 3.3. Suppose that a feature space V = of the VSM has four features 
(locationi, location 2 , height, and brightness) and some normalized feature vectors. Let 
B = {61,62,63,64} denote the standard (ordered) basis of M^. Now, four features in the 
feature space are interpreted in such a manner that 61 := locationi, 62 := location 2 ,e^ := 
height, and 64 := brightness. Suppose also that the transformation matrix [T]b of a 
linear operator T : V —)■ V with respect to B is given as follows. 


[T]b 


/3 1 0 0\ 
13 0 0 
0 0 10 
\0 0 0 1 / 


Since [T]b is an invertible and symmetric matrix, it is diagonalizable to an invertible 
diagonal matrix in Zl(4, M) by Lemma A.l and A.2. In other words, there is a transition 
matrix P from an ordered basis B' = {c}, 62 , 63 , 64 } to B = { 61 , 62 , 63 , 64 } such that 
[T]b' = P ^[r]^^ is an invertible diagonal matrix, i.e., \T]b’ G 44(4,M). By using the 
diagonalization procedure (see [3]), we have 


/l /\/2 -l /\/2 0 0 \ 
l/y/2 1/v^ 0 0 

0 0 10 

Vo 0 01/ 


and \T]b' = 


/4 0 0 0\ 
0 2 0 0 
0 0 10 
yo 0 0 1/ 


It follows that e'l = l/\/2ei — l/\/ 262 , 62 = l/-v/2ei + l/\/ 262 , 63 = 63 , and 64 = 64 . 
Therefore, \T]b' G 44(4, M) is a transformation matrix of a diagonalizable scaling linear 
operator of V with respect to B' by Proposition 3.2. 


Let B(n,M.) be the set of all n x n invertible upper triangular matrices with real 
entries. B(n,R) forms a subgroup of G4y(n,M), called the standard Borel subgroup [2] of 
GL(n,M). 



Let F = M” be a vector space of the VSM and B = {ei,..., Sn} be its fixed standard 
(ordered) basis. Then, there is an ascending chain of subspaces {0} C Li = M C V 2 = 
C • • • C Vn-i = C Vn = in which each V) for 1 < i < n is spanned by 

basis elements This ascending chain is called the standard complete flag [2] 

ol V = M”. The following proposition describes that if information-object vectors in 

V = are transformed by an n x n invertible upper triangular matrix m G B(n,R), it 
preserves the standard complete flag oiV = 

Proposition 3.4. Let V = be a vector space of the VSM and let {0} C hi = M C 
V2 = C • • • C Vn-i = C 14 = be the standard complete flag of V = M”. 

Then, gVi = V) for 1 <i <n, where g G B{n,R) is an n x n invertible upper triangular 
matrix with real entries. 

Proof. It follows immediately from the fact that the standard Borel subgroup B{n,M.) of 
GL{n,R) stabilizes the standard complete flag of P = M"" (see [2] for further details). □ 

Remarks. In IR high-dimensional information-object vectors in a vector space of 
the VSM are often projected into a low-dimensional subspace in order to improve 
computational efficiency [58]. Now, consider information-object vectors in a vector space 
14 = M” of the VSM and project them into a subspace Vi {1 < i < n) oi Vn = M”. By 
Proposition 3.4, the projected information-object vectors are transformed and remained 
in that subspace by a linear operator of I 4 = if the transformation matrix of the 
linear operator with respect to the standard (ordered) basis is an invertible upper 
triangular matrix with real entries. 

For instance, if document ZI 3 in Example 3.1 is transformed by t G R( 6 ,M), it still 
resides in the subspace spanned by the basis element corresponding to termi and the 
basis element corresponding to term‘ 2 . This is not the case if document ZI 3 is transformed 
by, let us say, a 6 x 6 invertible lower triangular matrix formed by replacing the (3, 1)- 
entry of the 6 x 6 identity matrix with 1 . 

We next describe the dual space of a vector space V over M of the VSM. Each 
information-object vector in V may associate with a scalar-valued quantity. Eor instance, 
if a bag of words consists of terms involving product or service items in a recommender 
system [23, 36, 60], each term may associate with a cost (e.g., purchase price). Now, 
consider the query and documents in Eigure 1, where the vector space of the VSM is 

V = M®. Let B = {ui,...,U 6 } be an ordered basis of M®. Those six terms are in¬ 
terpreted in such a manner that ui := termi,... ,u% '■= term^. Using the frequency 
weighting scheme, we have Q = ui + U 2 , Di = 2ui + 2u^, D 2 = U 2 + 2 u 3 + U 4 + uq, 
and D 3 = ui + 2 u 2 . We now consider the dual space U of a vector space V of the VSM. 
Suppose that the costs involving each term are 3 for ui and 4, 5, 6, 6 and 7 for U 2 , U 3 , 
U 4 , U 5 , and uq, respectively. An important linear functional in the dual space V of U is 
4> = 3ui -|- 4 u 2 + 5 u 3 -|- 6 u 4 -|- 6% -|- in which < f, ui >= 3, < U 2 >= 4, and so on. 
By pairing f with a term, the cost of the term is restored. Similarly, by pairing cf with 
an information-object vector, the total cost of the information-object vector is obtained. 
For instance, the total cost of D 2 is < f, D 2 >=4-|-2x5-|-6-|-7 = 27. The following 
proposition describes the relationship between the representation p : G ^ GLfV) of G 
and the dual representation p{g) = [p{g~^)]^ : U —)• U of G used in the VSM. 

Proposition 3.5. Let V be a vector space over M of the VSM and V be the dual space 
of V. Let p : G ^ GLiV) be a representation of G and let p : G ^ GLiV) be the dual 
representation of G to p : G ^ GL{V) acting on V given by 



Then, < p{g){v), p{g){v) >=< v,v > for all g ^ G,v ^ V, and v gV. 


Proof. See Lemma A.3. 


□ 


Example 3.4. Let V = be two dimensional vector space with standard (ordered) 
basis elements ei = [1,0]"'' and 62 = [0,1]''', and let V be the dual space of V with ordered 
basis elements ei and 62 selected by Theorem A.4. Let u be an information-object vector 
in V and V' be a linear functional in V such that u = ei + 62 and xf = 2ei -|- 4 e 2 . We use 
the frequency weighting scheme for u and if. Therefore, the document corresponding to 
u consists of the term corresponding to Ci and the term corresponding to 62 - Similarly, xf 
can be interpreted in such a manner that the cost of the term corresponding to ei is 2 and 
the cost of the term corresponding to 62 is 4. The total cost of u is obtained by pairing 
xf with u, that is, < xf,u >=< 2ei + 4 e 2 ,ei -|- 62 >= 6. Let p : i4(2,M) —)■ GL(2,M) 

be a matrix representation of D{2,M.) associated with V such that p{g) = 2 ^ 

g G i4(2,M). Then, p{g) transforms n = ci -|- 62 into u' = ei + 262 - We see that u' now 
consists of a single ei and two e 2 ’s. Let p : D{2,R) —)■ G-L(2,M) be the dual matrix 
representation of i4(2,M) associated with V as shown in Proposition 3.5. We then have 

Kd) — [pi9~^)V — ^ 01 / 2 ) satisfies < p{g){xp), p{g){u) >=< xjj,u >. Note that 

p{9)-,p{9) £ D{2,M) for g G D(2,R). Now, p{g) transforms xf = 2ei -|- 4 e 2 into xf' = 
2ei -|- ( 4 / 2 )e 2 = 2ei -|- 2 e 2 . This means that the cost of the term corresponding to 62 has 
to be reduced to the half of the original cost of the term corresponding to 62 so that the 
value of < xp,u> \s invariant, i.e., < p{g){xp), p{g){u) >=< xp',u' >=< xf,u >. 


4. Related Work and Discussion 

The proper representation of information objects plays an important role in information 
retrieval (IR), since without it, we cannot expect the good retrieval performance. 

This paper has assumed that information objects are represented by vectors in a vector 
space of the VSM and that certain types of transformations of information objects are 
well-defined by linear transformations of a vector space of the VSM. The results shown in 
this paper are concerned with the representation of information objects involving several 
types of transformations in a vector space of the VSM for the purpose of information 
retrieval (IR), semantics, etc. 

By using group representation theory and linear algebra, this paper provides the math¬ 
ematical foundation of vector space representation of information objects under group 
actions, allowing the known group-theoretical results to be adapted for vector space rep¬ 
resentation of information objects used in IR, semantics, etc. 

We have discussed several groups of invertible linear transformations on a vector space 
of the VSM in previous sections. 

In [41-45] context change is modeled by linear transformations from one basis to an¬ 
other in a vector space of the VSM, in order to reflect the information needs evolving 
with users, time, spaces, etc. 

Permutation transformations using permutations of vector coordinates on a xuord 
space [51] in order to capture and encode word-order information are discussed in [51]. 

Unitary transformations [28, 59] on a Hilbert vector space [31] used in IR are discussed 
in [59], in which a Hilbert vector space is a complete inner product space [31]. 

In [59] the notions of quantum mechanics (QM) [18], such as state vector [18], observ¬ 
able [18, 59], superposition [4], and uncertainty [4, 18], are translated into the notions of 
IR, intending to apply some of the known theorems (e.g., Gleason’s theorem [59]) of QM 



to the IR context. In that book a document is represented by a vector in a Hilbert vec¬ 
tor space, while relevance is represented by a Hermitian operator [59] that encapsulates 
the uncertainty involving relevance. (The interested reader may also refer to [61] for the 
geometry of conceptual space using vector spaces and quantum theory.) 

The dual space model for semantic relations and compositions is discussed in [57], 
which consists of a domain space and a function space for two distinct similarity mea¬ 
sures. However, it does not involve any dual space of linear functionals on a vector space. 
Meanwhile, the dual space of linear functionals on a vector space is involved in Dirac 
notation [18, 59] that is used for relevance feedback [26] and ostensive retrieval [59] in 
IR (see [59]). 

Although group representation theory involving tensor product [15] of vector spaces 
are well-studied in mathematics [22], we have not considered any tensor product of vector 
spaces for semantics in terms of group representation theory in this paper. In computa¬ 
tional and mathematical linguistics [10, 57, 58] the vector space tensor product is often 
used to model conipositionality [46] (see [24, 25] also for tensor-based compositionality). 
In [9, 10] a compositional distributional model of meaning [9, 10] using category the¬ 
ory [39] is discussed, where tensor product is employed for the composition of meanings 
and types. In that framework VSMs are used for distributional theory of meaning [11], 
and Pregroups [9] are used for a compositional theory for grammatical types [10]. (Both 
the category of vector spaces and the category of Pregroups are examples of compact 
closed categories [39]. The interested reader may refer to [10] for further details.) We 
leave it as our future work to consider tensor product representation of certain informa¬ 
tion objects under group actions for the purpose of IR, semantics, machine learning [13], 
etc. 


5. Conclusions 

Although group theory is a major area of research in mathematics, few researches have 
been done how it is utilized for the VSM. This paper discussed certain dynamic transfor¬ 
mations of information objects used in the VSM by means of group-theoretical methods. 
In our framework an information object is considered as a dynamic entity rather than a 
static one, where a dynamic transformation of information objects is represented by an 
element of a group of invertible linear operators on a vector space of the VSM. Several 
groups act on a vector space V of the VSM by means of their matrix representations, in 
order to perform a dynamic transformation of information-object vectors systematically. 
We also showed how the dual space V of V can be employed for the existing VSM. We 
leave it as an open question to allow other groups that are not discussed in this paper 
to act on a vector space of the VSM and to derive the useful properties involving some 
dynamic transformations of information objects used in the VSM. 


Appendix. Vector Spaces, Groups, and Representations 

In this section we summarize the necessary mathematical background used in this paper. 
The definitions and results in this section are found in [2, 3, 7, 12, 15, 21, 22, 27, 30- 
32, 35, 49, 50]. We assume that the reader has some familiarity with linear algebra. 

A group {G, •) is a nonempty set G, closed under a binary operation •, such that the 
following axioms are satisfied: (i) {a-b)-c = a-ib-c) for all a,b,c £ G, (ii) there is a unique 
element e £ G, called the identity element of G, such that for all x G G, e - x = x ■ e = x, 
(hi) for each element a £ G, there is an element G G such that a - a~^ = a~^ ■ a = e. 
A group G is abelian if its binary operator • is commutative such that a • 6 = 6 • a for all 



a,b £ G. 

Let 1,1 = {1, 2,, n}. The group of all bijections /„ —)■ In, whose binary operation is 
function composition, is called the symmetric group on n letters and is denoted by Sn- 

Let G be a group and H he a nonempty subset of a group G. If Lf is a group under 
the restriction to H of the binary operation of G, then H is called a subgroup of G. 

Let (G, •) and (G', o) be groups. A map (/> : G —)■ G' is a homomorphism if 4){x ■ y) = 
o (p(y) for all x,y £ G. 

A ring is a nonempty set R together with two binary operations +, x : i? x i? —)■ 
R (called addition and multiplication) such that: (i) {R, +) is an abelian group, (ii) 
(a X 6) X c = a X (6 X c) for all a,b,c £ R, (hi) a x {b + c) = {a x b) + {a x c) and 
{a + b) X c = (a X c) + (6 X c). In addition, (iv) if a x 6 = 6 x a for all a, 6 G R, 
then R is said to be a commutative ring, (v) if R contains an element In such that 
I/{ X a = a X = a for all a G i?, then R is said to be a ring with unity. 

If {R, +, X ) is a ring and (G, •) is a group, we also write ab rather than a x 6 for 
a,b £ R, and write ab rather than a • 6 for a, 6 G G, respectively. 

An element x in a ring R with unity is said to be left (respectively, right) invertible if 
there exists an element z (respectively, y £ R) ina ring R such that zx = 1 r (respectively, 
xy = Ij?). An element x £ R that is both left and right invertible is said to be a unit. 

A ring R with unity 7 ^ 0 in which every nonzero element is a unit is called a division 
ring. A field is a commutative division ring. 

Let i? be a ring. A (left) R-module is an additive abelian group M together with a 
scalar multiplication dehned by a function R x M ^ M such that for all r,s £ R and 
a,b £ M: (i) {rs)a = r{sa), (ii) (r + s)a = ra + sa, (hi) r{a + b) = ra + rb. In addition, 
if ii is a ring with unity and = a for all a £ M, then M is a unitary R-module. 

If ii is a held, a unitary ii-module M is called a vector space M over R. 

In the remainder of this paper G denotes a group, K a held, and V denotes a hnite- 
dimensional vector space unless otherwise stated. 

Let V, W be vector spaces over K. A function T : F —)■ IT is a linear transformation 
from T to IT provided that for all x,y £V and k £K: (i) T{x + y) = T{x) + T{y), (ii) 
T{kx) = kT{x). A linear transformation from V to itself is also called a linear operator 
of T. 

A (left) action of a group G on a set A is a function G x A —)■ A (given by (^f, x) i-G gx) 
such that for all x G A and gi,g 2 £ G: (i) ex = x, (ii) {gig 2 )x = gi{g 2 x). When such an 
action is given, we say that G acts (left) on the set A. 

The general linear group GL{n, K) is the group of all invertible n x n matrices with 
entries from K under matrix multiplication. An n x n matrix is invertible if and only if 
its determinant is not zero. Alternatively, the general linear group of V is the group of 
all invertible linear transformations from T to T and is denoted by GL{V). (If T is a 
hnite n-dimensional vector space, then GL(n,]K) and GL{V) are isomorphic as groups. 
See [15] for details.) 

The general linear group GL(n,]K) and its subgroups act on T = M” by matrix mul¬ 
tiplication, considering each vector in T as a column matrix. (That is, if M G GL(n,]K) 
and X G T, (M, x) e-)- Mx.) 

A linear representation of G is a group homomorphism p : G ^ GLiV) from G into 
GL{V). Similarly, a matrix representation of G is a group homomorphism p' : G ^ 
GL(n,]K) from G into GL(n,]K). 

Suppose G acts on a vector space V over K. The action of G on T is called linear if 
the following conditions are met: (i) g{v + w) = gv + gw for all g £ G and v,w £ V, (ii) 
g{kv) = k{gv) for all g £ G,k £ K, and u G T. If G acts on V linearly, then V itself is 
called a representation of G, and write gv 01 g ■ v for p{g){v) . 

Let T be a vector space over M. An inner product for T is a function ( , ) from 



V X V into M which satisfies the following for all x,y,z G V and for all A: G M; (i) 
{kx + y,z) = k{x,z) + {y,z), (ii) {x,y) = {y,x), (hi) (x,x) > 0, (iv) if (x,x) = 0, then 
X = 0. 

Theorem A.l ([31]). The equation 

n 

{x,y) = '^Xkyk, 

k=l 

where x = (xi,..., x^), y = (yi,..., yn) G M"" defines an inner produet on M"'. 

Let ||x|| = (XlILi '^‘i) ^ ’ where x = (xi,..., Vn) G M”. Then, the geometric interpretation 
of {u,v) is (rt,x) = ||tt||||x||cos 0, where 9 is the angle between u and v. For an inner 
product on F = we write u ■ v rather than (tt, v) 

If we change an ordered basis B = {bi,..., bn} of an n-dimensional vector space V to 
the new ordered basis B' = {b},..., b'n}, then a vector v has old coordinate matrix [x]s 
and a new coordinate matrix [v\b' , respectively. It is related to the equation [x]_b = 5'[x]_b' , 
where S is called the transition matrix from S' to i?. If A and Y are square matrices 
(i.e., nx n matrices), then Y is similar to X if there is an invertible matrix P such that 

Y = P-^XP. 

Let M = (aij) be an n x n matrix. The main diagonal of M consists of the entries 
an for 1 < i < n. A matrix D is called diagonal if its non-zero entries appear only on 
the main diagonal. A matrix U is called upper triangular if all entries of U lying below 
the main diagonal are zero. A matrix L is called lower triangular if all entries of L lying 
above the main diagonal are zero. 

A square matrix M is called diagonalizable if it is similar to a diagonal matrix. 

A linear operator T of F is called diagonalizable if there exists an (ordered) basis of V 
with respect to which the transformation matrix of T is a diagonal matrix. 

A square matrix is called symmetric if A = AJ . 

Let F be a vector space over K. If T is a linear operator of V, a nonzero vector v ^ V 
satisfying Tv = Xv for some A G IK is called an eigenvector of T. The following theorem 
describes the fundamental fact of a diagonalizable matrix. 

Theorem A.2 ([3]). An nx n matrix M with real entries is diagonalizable if and only 
if M has n linearly independent eigenvectors. 

Lemma A.l ([3, 7]). Every symmetric matrix with real entries is diagonalizable. 

Lemma A.2 ([3, 7]). Similar matrices have the same determinant. 

Given a linear operator T :V ^ V, the following theorem describes how the transfor¬ 
mation matrix of a linear operator of V changes as we change a basis. 

Theorem A.3 ([3]). Let T : V ^ V be a linear operator of V and let B and B' 
he both bases for V. Then, [T]b and [T]b' are similar, where \T]b (respectively, [T]b') 
denotes the transformation matrix ofT with respect to B (respectively, B'). Specifically, 
[T]b' = S ^[T]bS, where S is the transition matrix from B' to B. 

Let F be a vector space over M, and R be a one-dimensional vector space over itself. 
Let Hom]8(L, R) be the set of all linear transformations from V to R. This set, denoted 
by V, forms a vector space over R, which is called the dual space of V. Elements of V 
are called linear functionals. 

Theorem A.4 ([15]). If B = {xi,... ,Vn} is a basis of a vector space V over R, define 
Vi G V for each i G {l,...,n} by its action on the basis B in such a manner that 
vfivj) = Sij for 1 < j < n, where dij for 1 < j < n denotes 0 G R i/ i / j and 1 G R i/ 



i = j. Then, V is a vector space over M with basis B = {-Oi,..., 

def 

There is a (bilinear) natural pairing < •,• > between V and V defined by < 0, r; > = 
(j){v) for (/) G 1/ and v £ V. (If A denotes a linear operator of V and A'^ denotes its dual 
or transpose operator of V, {A~^(l)){v) = (j){Av) iov (f) £V,v £V [27].) 

Let V = HomR(I/, M) be the dual space of V and let p : G —)• GL{V) be a representation 
of G. The dual representation p : G ^ GL{V) to p : G —)■ GLiV) is the representation of 
G acting on V given by p(g) = [p(5'~^)]''~ : V ^ V, where p{g) is the transpose of p{g~^). 

The following lemma describe the relationship between a representation p : G ^ 
GL{V) of G and the dual representation p : G ^ GL(V) of G. 

Lemma A.3 ([22]). < p{g){v), p{g){v) >=< v,v > for all g £ G,v £ V, and v £V. 

The definition of the dual representation is such that the following diagram com¬ 
mutes [22]: 


V M 


9 


9 


V -> M 

9<l> 


Therefore, {g(j)){v) = g(j){g~^v) for all p G G and v £ V. Since gx = x for all x G M, 
we have (p(/>)(u) = g(j){g~^v) = (j){g~^v). Since (l){g~^v) = {{g~^)~^(j)){v), we have gcj) = 
{g~^)~^(j), which corresponds to the above definition. 
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